Much effort has been invested in attempting to automatically extract information from MRDs which is then converted or incorporated into lexical entries for an NLP system, and there has been some headway in acquiring lexical information from MRDs. Initial attempts in this area focused on identification of morphological and syntactic information (e.g. Byrd 1983, Boguraev and Briscoe 1987), but more recently there has been research into extraction of lexical semantic information as well. This research has largely concentrated on identifying taxonomies (e.g. Chodorow 1985, Lesk 1986, Vossen et al. 1989, head finding as described in Byrd et al. 1987, Calzolari 1991), although some work in identifying related sets of words has also been undertaken (e.g. filtering as described in Byrd 1987). The identification of taxonomies has proceeded largely on the basis of analyzing dictionary entries for the genus term of a word and thus identifying hypernym relations between entries.
However, several problems with using MRDs are repeatedly identified in the literature. They can be summarised as follows:
Much MRD research (see, inter alia, the articles in Boguraev and Briscoe 1989a and Zernik 1991) has focused on the machine readable version of the Longman Dictionary of Contemporary English (LDOCE) because it seems to overcome some of the problems with MRDs to a certain degree. This is a result of (1) limiting definition vocabulary to a set of core vocabulary which is used consistently, (2) marking of the words in definitions with a sense number so that the exact intended meaning is specified, and (3) providing ``subject'' codes which indicate the domain in which a word sense is most likely to appear and some selectional restrictions. However, the latter codes are used inconsistently and are incomplete (Boguraev and Briscoe 1989b:17) and so little use has been made of them.
Work with LDOCE has achieved better results than other MRD research because it does not involve completely free text analysis. Despite this, the amount of semantic information useful for NLP which has been automatically extracted from dictionary definitions is severely limited. The construction of semantic taxonomies is clearly very important for NLP systems, but if these taxonomies are derived from a source which makes artificial divisions between word senses in some cases and conflates word senses which might have linguistically significant differences in other cases (consider the causative and unaccusative forms of a verb like roll which in some dictionaries are described under a single sense), their utility for precise interpretation seems questionable. Even extraction of subcategorisation information cannot always proceed systematically from MRDs, so their utility for establishing more complex semantic lexicons is certainly in doubt.
Because dictionaries are written with a human user in mind -- even if the vocabulary in the definitions is restricted to a subset of natural language -- they leave much of a speaker's knowledge of a word or concept unexpressed, relying on world knowledge, understanding of the general context in which a word appears, and cognitive processes such as analogy to fill in the gaps. They also do not need to explicitly mention the ``linguistically-relevant'' components of the meaning of a word. These are identified by a language user on the basis of perceived similarity between words, in ways which do not necessarily correspond to general semantic relatedness but to shared lexical entailments of the kind discussed in Chapter 2. For an NLP system to use and interpret words appropriately, however, these lexical relations must be made explicit in the lexicon. For these reasons, MRDs are not adequate on their own as a source of lexical knowledge for computational systems.